DIP-Python tutorials for image processing and machine learning(69)-BOVW
学习自 Youtube 博主 DigitalSreeni。
文字数:1.7k
阅读数:2
正文
69 - Image classification using Bag of Visual Words -BOVW-
它用于图像分类,而不是像素分割
All cell images resized to 128 x 128
Images used for test are completely different that the ones used for training.
136 images for testing, each parasitized and uninfected (136 x 2)
104 images for training, each parasitized and uninfected (104 x 2)
Cannot import lots of data to Github, so uploaded 10 images of each.
Download full dataset from: ftp://lhcftp.nlm.nih.gov/Open-Access-Datasets/Malaria/cell_images.zip
这个链接好像打不开?找了个其他地址:https://www.kaggle.com/datasets/iarunava/cell-images-for-detecting-malaria?resource=download
Train_BOVW
PYTHON
1 |
|
- Get the training classes names and store them in a list
- Here we use folder names for class names
PYTHON
1 |
|
- Get path to all images and save them in a list
- image_paths and the corresponding label(对应标签)in image_paths
PYTHON
1 |
|
- To make it easy to list all file names in a directory let us define a function
PYTHON
1 |
|
- Fill the placeholder empty lists with image path, classes, and add class ID number
- 用 image path,classes 和 class ID number 填充 empty lists
PYTHON
1 |
|
PYTHON
1 |
|
['images/cell_images/train\\Parasitized\\C37BP2_thinF_IMG_20150620_133111a_cell_87.png',
'images/cell_images/train\\Parasitized\\C37BP2_thinF_IMG_20150620_133111a_cell_88.png',
'images/cell_images/train\\Parasitized\\C37BP2_thinF_IMG_20150620_133205a_cell_87.png',
'images/cell_images/train\\Parasitized\\C37BP2_thinF_IMG_20150620_133205a_cell_88.png',
'images/cell_images/train\\Parasitized\\C37BP2_thinF_IMG_20150620_133238a_cell_97.png',
'images/cell_images/train\\Parasitized\\C38P3thinF_original_IMG_20150621_112043_cell_202.png',
'images/cell_images/train\\Parasitized\\C38P3thinF_original_IMG_20150621_112043_cell_203.png',
'images/cell_images/train\\Parasitized\\C38P3thinF_original_IMG_20150621_112116_cell_204.png',
'images/cell_images/train\\Parasitized\\C38P3thinF_original_IMG_20150621_112116_cell_205.png',
'images/cell_images/train\\Parasitized\\C38P3thinF_original_IMG_20150621_112138_cell_183.png',
'images/cell_images/train\\Uninfected\\C1_thinF_IMG_20150604_104919_cell_240.png',
'images/cell_images/train\\Uninfected\\C1_thinF_IMG_20150604_104942_cell_102.png',
'images/cell_images/train\\Uninfected\\C1_thinF_IMG_20150604_104942_cell_11.png',
'images/cell_images/train\\Uninfected\\C1_thinF_IMG_20150604_104942_cell_139.png',
'images/cell_images/train\\Uninfected\\C1_thinF_IMG_20150604_104942_cell_151.png',
'images/cell_images/train\\Uninfected\\C1_thinF_IMG_20150604_104942_cell_20.png',
'images/cell_images/train\\Uninfected\\C1_thinF_IMG_20150604_104942_cell_4.png',
'images/cell_images/train\\Uninfected\\C1_thinF_IMG_20150604_104942_cell_59.png',
'images/cell_images/train\\Uninfected\\C1_thinF_IMG_20150604_104942_cell_72.png',
'images/cell_images/train\\Uninfected\\C1_thinF_IMG_20150604_104942_cell_98.png']
- 总共两类:Parasitized 寄生,Uninfected 未被感染
PYTHON
1 |
|
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
PYTHON
1 |
|
2
- Create feature extraction and keypoint detector objects
- 创建特征提取和关键点检测器对象
- SIFT is not available anymore in openCV
- SIFT 在 openCV 中不再可用
- Create List where all the descriptors will be stored
- 创建一个列表存储所有的描述符
PYTHON
1 |
|
OpenCV 尺度不变特征检测:SIFT、SURF、BRISK、ORB
- BRISK is a good replacement to SIFT. ORB also works but didn’t work well for this example
- BRISK 是 SIFT 的良好替代品。ORB 也可以工作,但在本例中效果不佳
PYTHON
1 |
|
- Stack all the descriptors vertically in a numpy array
- 在 numpy 数组中垂直堆叠所有描述符
PYTHON
1 |
|
array([[244, 255, 223, ..., 0, 17, 48],
[254, 191, 247, ..., 8, 25, 0],
[240, 255, 255, ..., 137, 25, 0],
...,
[128, 255, 255, ..., 0, 0, 0],
[176, 255, 255, ..., 0, 0, 0],
[240, 255, 255, ..., 0, 0, 0]], dtype=uint8)
- kmeans works only on float, so convert integers to float
PYTHON
1 |
|
- Perform k-means clustering and vector quantization
- 执行 k 均值聚类和矢量量化
这里使用 k-means,也可以使用 SVM 或 随机森林。
PYTHON
1 |
|
- Calculate the histogram of features and represent them as vector
- 计算特征的直方图并将其表示为向量
- vq Assigns codes from a code book to observations.
- vq 将代码簿中的代码分配给观察值
PYTHON
1 |
|
PYTHON
1 |
|
array([ 48, 14, 24, 50, 86, 177, 199, 91, 24, 15, 21, 44, 86,
192, 71, 46, 193, 59, 154, 2, 80, 119, 43])
PYTHON
1 |
|
array([ 79.62537284, 76.25693411, 150.61976132, 0. ,
189.20699172, 167.46438427, 0. , 132.3697473 ,
95.40341975, 137.6727198 , 113.90895487, 104.85068749,
104.80526159, 0. , 170.24394262, 220.20785635,
118.6493433 , 77.81910113, 0. , 101.40636075,
217.89599966, 84.18283673, 133.43163043])
- 执行 Tf-Idf 矢量化
PYTHON
1 |
|
- Scaling the words standardize features by removing the mean and scaling to unit variance in a way normalization
- 通过去除均值并以归一化的方式缩放到单位方差来缩放单词标准化特征
PYTHON
1 |
|
- Train an algorithm to discriminate vectors corresponding to positive and negative training images
- Train the Linear SVM
PYTHON
1 |
|
- Save the SVM
- Joblib dumps Python object into one file
- Joblib 将 Python 对象转储到一个文件中
PYTHON
1 |
|
['bovw.pkl']
Validate_BOVW
PYTHON
1 |
|
- Load the classifier, class names, scaler, number of clusters and vocabulary from stored pickle file (generated during training)
- 从存储的 pickle 文件中加载分类器、类名、缩放器、聚类数和词汇表(在训练期间生成)
PYTHON
1 |
|
- instead of test if you use train then we get great accuracy
- 如果你用训练集来代替测试,我们会得到很高的准确性
PYTHON
1 |
|
PYTHON
1 |
|
- Until here most of the above code is similar to Train excerpt for kmeans clustering
- Report true class names so they can be compared with predicted classes
- 报告真实的类别名称,以便与预测的类别进行比较
PYTHON
1 |
|
- Perform the predictions and report predicted class names.
- 执行预测,并报告预测的类名。
PYTHON
1 |
|
- Print the true class and Predictions
PYTHON
1 |
|
true_class =['Parasitized', 'Parasitized', 'Parasitized', 'Parasitized', 'Parasitized', 'Parasitized', 'Parasitized', 'Parasitized', 'Parasitized', 'Parasitized', 'Uninfected', 'Uninfected', 'Uninfected', 'Uninfected', 'Uninfected', 'Uninfected', 'Uninfected', 'Uninfected', 'Uninfected', 'Uninfected']
prediction =['Parasitized', 'Parasitized', 'Uninfected', 'Parasitized', 'Uninfected', 'Parasitized', 'Uninfected', 'Uninfected', 'Parasitized', 'Uninfected', 'Parasitized', 'Uninfected', 'Uninfected', 'Uninfected', 'Uninfected', 'Uninfected', 'Uninfected', 'Uninfected', 'Uninfected', 'Uninfected']
- To make it easy to understand the accuracy let us print the confusion matrix
PYTHON
1 |
|
PYTHON
1 |
|
accuracy = 0.7
[[5 5]
[1 9]]
PYTHON
1 |
|
如果传统方法(SVM、K-Means、Random Forest)仍不能得到较好的准确性,需要考虑深度神经网络等技术。